*----------------------------------------------------------------------------------------------------------	* 
* Silencing the Rails: A Study of the Noise-Safety Trade-off in Railway Quiet Zones                         *
* RESEARCHERS:		Emtiaz Hritan																		    *
* PROGRAMMED BY:	Emtiaz Hritan 					 											            *
* CREATED:			Oct. 25, 2023																		   	*
* LAST MODIFIED:	Oct. 7, 2025														       				*
*----------------------------------------------------------------------------------------------------------	*

clear all
set more off
* Set local paths
* Set this local datapath equal to the folder location for data
	local datapath "C:\Users\name\Replication Files Silencing the Rails\Data"
* Set this local outputpath equal to the folder location for outcome like tables
	local outputpath "C:\Users\name\Replication Files Silencing the Rails\Outcome"

* Install necessary packages
ssc install reghdfe
ssc install ftools
ssc install grstyle, replace
ssc install palettes, replace
ssc install colrspace, replace
ssc install ppmlhdfe, replace 
ssc install jwdid, replace 
ssc install csdid, replace 
ssc install drdid, replace 
ssc install did_imputation, replace
ssc install frause, replace
frause mpdta.dta, clear
ssc install hdfe, replace 
ssc install event_plot, replace
ssc install addplot, replace






***************************************************
*Table 1—: Summary Statistics for Quiet Zones
***************************************************
cd "`datapath'"
*********************************Balance test for not yet treated*********

* Let's create a balance test table with means, standard deviations, and p-values for the mean differences of important characteristics of railway crossings between two groups (e.g., a treatment group and a control group)
clear all 
use final 

* Crossing IDs are string.Let's change it:
egen crossingid = group(crossing_id)

* Varibles included in this balance test: Crossing characteristics: publicprivate, crossingwarninglocation, crossingilluminated. Let's generate dummy variables: 
tab publicprivate
gen public=0
replace public=1 if publicprivate=="Public"

tab crossingwarninglocation
gen both_side_warning=0
replace both_side_warning=1 if crossingwarninglocation=="Both sides"

tab crossingilluminated

gen illuminated=.
replace illuminated=1 if crossingilluminated=="Yes"
replace illuminated=0 if crossingilluminated=="No"


* Varibles included in this balance test: Accident variables: ampm, time, estimatedvehiclespeed, temperature, visibility, weathercondition, trainspeed,viewobstruction,  crossinguserskilled, crossingusersinjured, vehicledamagecost, employeeskilled, employeesinjured, passengersinjured
tab ampm
gen am =0
replace am=1 if ampm=="AM"

tab visibility
generate dark=0
replace dark=1 if visibility=="Dark"

tab weathercondition
generate clear_weather=0
replace clear_weather=1 if weathercondition=="Clear"

tab viewobstruction
generate view_obstruction=1
replace view_obstruction=0 if viewobstruction=="Not obstructed"

* Apart from monthly and yearly accidents per crossing, I can calculate the mean, sd, observations and mean difference of all other variables. I can add the monthly and yearly accidents per crossing alter in the .tex table. Let's get the balance for dummy variables. 

* The control group is "Not Yet Treated" now. First, delete the "Never Treated" crossings; in that way I will have only the treated unit. Then compare the "ultimately treated" units with "Not Yet Treated" units. 
* Let's define "first_treat". first_treat=0 if never treated and first_treat = date first treated if ever treated.
gen first_treat= WhistleDate
replace first_treat=0 if quietzone ==0 
drop if first_treat==0
format %td first_treat
drop if first_treat==.
* Control is defined as the railway crossing that's yet to be treated
gen treated=0
replace treated=1 if date > first_treat

* Drop data before 02 November, 1994
drop if date< mdy(11,2,1994)

* Check for balance between the treatment and control groups
eststo control: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 0
eststo treatment: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 1 
eststo diff: quietly estpost ttest public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured, by(treated) unequal
esttab control treatment diff,cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label

esttab control treatment diff using balance_not yet.tex, tex cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label
eststo clear

*--------------------------------------------------------------------------
* END
*--------------------------------------------------------------------------


***************************************************
* Table A1—: Summary Statistics
***************************************************

* Let's create a balance test table with means, standard deviations, and p-values for the mean differences of important characteristics of railway crossings between two groups (e.g., a treatment group and a control group)
clear all 
use final 

* Generate treatment and time dummy variable
* Teatment is defined as the railway crossing that has the established quietzone
gen treated =0 
replace treated =1 if quietzone==1
 
* Crossing IDs are string.Let's change it:
egen crossingid = group(crossing_id)

* Varibles included in this balance test: Crossing characteristics: publicprivate, crossingwarninglocation, crossingilluminated. Let's generate dummy variables: 
tab publicprivate
gen public=0
replace public=1 if publicprivate=="Public"

tab crossingwarninglocation
gen both_side_warning=0
replace both_side_warning=1 if crossingwarninglocation=="Both sides"

tab crossingilluminated

gen illuminated=.
replace illuminated=1 if crossingilluminated=="Yes"
replace illuminated=0 if crossingilluminated=="No"


* Varibles included in this balance test: Accident variables: ampm, time, estimatedvehiclespeed, temperature, visibility, weathercondition, trainspeed,viewobstruction,  crossinguserskilled, crossingusersinjured, vehicledamagecost, employeeskilled, employeesinjured, passengersinjured
tab ampm
gen am =0
replace am=1 if ampm=="AM"

tab visibility
generate dark=0
replace dark=1 if visibility=="Dark"

tab weathercondition
generate clear_weather=0
replace clear_weather=1 if weathercondition=="Clear"

tab viewobstruction
generate view_obstruction=1
replace view_obstruction=0 if viewobstruction=="Not obstructed"

* Apart from monthly and yearly accidents per crossing, I can calculate the mean, sd, observations and mean difference of all other variables. I can add the monthly and yearly accidents per crossing alter in the .tex table. Let's get the balance for dummy variables. 

* Drop data before November, 1994
drop if year< 1995

* Check for balance between the treatment and control groups
eststo control: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 0
eststo treatment: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 1 
eststo diff: quietly estpost ttest public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured, by(treated) unequal
esttab control treatment diff,cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label

esttab control treatment diff using summarystats.tex, tex cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label
eststo clear

*--------------------------------------------------------------------------
* END
*--------------------------------------------------------------------------



***************************************************
* Table A2—: Summary Statistics for Pre-Quiet Zone Rule
***************************************************


*Let's look at the pre quiet zone data (before 6/24/2005) 

// First, convert June 24, 2005, to Stata numeric date format
gen date_to_compare = mdy(6, 24, 2005)

// Now, drop observations before June 24, 2005
drop if date < date_to_compare
* Check for balance between the treatment and control groups
eststo control: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 0
eststo treatment: quietly estpost summarize public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured if treated == 1 
eststo diff: quietly estpost ttest public both_side_warning illuminated am dark clear_weather view_obstruction estimatedvehiclespeed temperature trainspeed crossinguserskilled crossingusersinjured vehicledamagecost employeeskilled employeesinjured passengersinjured, by(treated) unequal
esttab control treatment diff,cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label

esttab control treatment diff using summarypre.tex, tex cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label
eststo clear



*--------------------------------------------------------------------------
* END
*--------------------------------------------------------------------------

********************************************************
* Table A3—: Determinants of Quiet Zone Adoption at Railroad Crossings 
********************************************************
*Testing the reason behind the geographic bunching of quiet zones. Meaning we want to know what is driving this endogenous selection?
* For this analysis, we need a dataset that has intercensal demographic data, BLS labor market data, crossing and quiet zone data and accidents data at the county level
* First import  BLS data and merge with the intercensal data
use bls_county_1990_2024
* Drop data for post-covid year
drop if year>2019
merge 1:1 county_fips year using county_1990_2019.dta, force 
*keep all counties from both files for now
drop _merge
gen merge_year= year
save county_bls_demo.dta, replace 

* Let's get accident data at the yearly level 
clear all  
use final
drop if year<2000
drop if year >2019 
collapse (sum) accident if !missing(year), by(crossing_id year)
duplicates report crossing_id year 
duplicates drop crossing_id year, force 
gen merge_year= year
tempfile crossing_accidents
save `crossing_accidents'

* Import and merge the all crossing, quiet zone and accident data 
clear all
use all_crossings
merge m:1 crossing_id using quietzone.dta, force 
replace quietzone=0 if quietzone==.
drop _merge

* Get one unique variable for the number of night and day train 
gen day_train_num=daythru 
replace day_train_num=TotalDaylightThruTrains if daythru==.

gen night_train_num=nghtthru
replace night_train_num=TotalNighttimeThruTrains if nghtthru==.
rename countycode county_fips

gen whistle_year= year(WhistleDate)
gen merge_year = whistle_year - 1 if !missing(whistle_year)
replace merge_year = 2010 if missing(whistle_year)  // common year for untreated
replace quietzone=0 if missing(whistle_year)
count if missing(merge_year)
merge m:1 county_fips merge_year using county_bls_demo.dta, force
keep if _merge==3
drop _merge

* Merge with accident data 

merge 1:1 crossing_id merge_year using `crossing_accidents', force 
* Drop the post period data
drop if _merge==2
drop _merge
replace accident=0 if accident==. 

* Generate nonwhite population
gen non_white= tot_pop - white_pop 

* Generate total population with 65 years of age or more 
gen age_65= (tot_pop_Age_65_to_69 + tot_pop_Age_70_to_74 + tot_pop_Age_75_to_79 + tot_pop_Age_80_to_84 + tot_pop_Age_85_plus)
* Switch to shares or percentages instead of raw counts to reduce multicollinearity
gen female_share = tot_female / tot_pop
gen nonwhite_share = non_white / tot_pop
gen age65_share   = age_65 / tot_pop
gen black_share =black_pop / tot_pop
gen hispanic_share =hispanic_pop / tot_pop

gen under_10 = tot_pop_Age_0_to_4 + tot_pop_Age_5_to_9
gen under10_share = under_10 / tot_pop

gen children_5_17 = tot_pop_Age_5_to_9 + tot_pop_Age_10_to_14 + tot_pop_Age_15_to_19
gen children_share = children_5_17 / tot_pop
* Rescale population 
gen total_population= tot_pop/100000




local controls accident day_train_num night_train_num UnemploymentRate total_population female_share nonwhite_share children_share

logit quietzone `controls', vce(cluster county_fips)
margins, dydx(*) post

*checking for multicollinearity using vif after a linear version of my model:
local controls accident day_train_num night_train_num UnemploymentRate total_population female_share nonwhite_share children_share
reg quietzone `controls'
vif 

eststo clear
*--------------------------------------------------------------------------
* END
*--------------------------------------------------------------------------


***************************************************
* Table A4—: Comparison of Train Volumes by Quiet Zone Status
***************************************************
* Import and merge the all crossing, quiet zone and accident data 
clear all
use all_crossings
merge m:1 crossing_id using quietzone.dta, force 
replace quietzone=0 if quietzone==.
drop _merge

* Get one unique variable for the number of night and day train 
gen day_train_num=daythru 
replace day_train_num=TotalDaylightThruTrains if daythru==.

gen night_train_num=nghtthru
replace night_train_num=TotalNighttimeThruTrains if nghtthru==.


* Drop negative values
drop if night_train_num <0 
drop if day_train_num <0 

* Total train traffic
gen total_trains = day_train_num + night_train_num
 

 
* Check for balance between the treatment and control groups
eststo control: quietly estpost summarize night_train_num day_train_num total_trains if quietzone == 0
eststo treatment: quietly estpost summarize night_train_num day_train_num total_trains if quietzone == 1 
eststo diff: quietly estpost ttest night_train_num day_train_num total_trains, by(quietzone) unequal
esttab control treatment diff,cells("mean(pattern(1 1 0) fmt(3)) sd(pattern(1 1 0)) b(star pattern(0 0 1) fmt(3) label(Difference)) p(pattern(0 0 1) par fmt(3) label(p-value))") label 

eststo clear


*--------------------------------------------------------------------------
* END
*--------------------------------------------------------------------------



















